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Abstract 

The  GMRES  and  Arnoldi  algorithms,  which  reduce  to  the  CR  and  Lanczos  al- 
gorithms in  the  symmetric  case,  both  minimize  ||p(>l)6||  over  polynomials  p  of 
degree  n.  The  difference  is  that  p  is  normalized  at  z  —  0  for  GMRES  and  at  c  =  oo 
for  Arnoldi.  Analogous  "ideal  GMRES"  and  "ideal  Arnoldi"  problems  are  obtained 
if  one  removes  b  from  the  discussion  and  minimizes  ||p(>l)  ||  instead.  Investigation 
of  these  true  and  ideal  approximation  problems  gives  insight  into  how  fast  GMRES 
converges  and  how  the  Arnoldi  iteration  locates  eigenvalues. 
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1.  Introduction 

Since  the  1950s  it  has  been  recognized  that  matrix  iterative  methods  are 
naturally  connected  with  approximation  theory.  The  most  familiar  connections 
are  between  polynomial  approximation  and  the  numerous  iterative  methods  that 
make  use  of  Krylov  subspaces,  including  the  Richardson,  Chebyshev,  conjugate 
gradient,  biconjugate  gradient,  CGNR,  GMRES,  CGS,  Bi-CGSTAB,  and  QMR 
iterations.  Sometimes  rational  approximation  problems  also  arise,  notably  in  the 
analysis  of  ADI  iterations,  circulant-preconditioned  Toeplitz  iterations,  and  Krylov 
subspace  algorithms  via  Fade  approximation.  Recent  references  on  these  matters 
include  [4,8,15,21]. 

The  approximation  problems  that  are  discussed  in  the  linear  algebra  literature 
almost  invariably  involve  scalar  functions  defined  on  subsets  of  the  complex  plane, 
or,  if  the  matrix  A  is  symmetric,  the  real  axis.  The  set  in  question  is  the  spectrum 
A  (/I)  or  an  estimate  of  the  spectrum.  If  A  is  normal,  such  reductions  are  sometimes 
exact  in  the  sense  that  the  behavior  of  the  matrix  iteration  is  determined  exactly 
by  the  properties  of  the  approximation  problem.  If  A  is  not  normal,  however,  they 
are  always  approximate.  GMRES,  for  example,  does  not  exactly  solve  any  known 
approximation  problem  in  the  complex  plane,  when  A  is  not  normal.  Between  the 
approximation  problem  and  the  convergence  of  the  matrix  iteration  there  is  a  gap 
of  size  k(I'),  the  condition  number  of  a  matrix  of  eigenvectors  of  A.  When  k.{V) 
is  large,  predictions  based  on  the  approximation  problem  may  have  little  bearing 
on  the  actual  convergence  of  the  matrix  algorithm  [15]. 

The  purpose  of  this  paper  is  to  explore  a  different  kind  of  approximation 
problem  that  can  also  be  associated  with  iterative  linear  algebra,  involving  matrices 
instead  of  scalars.  Instead  of  asking  how  small  a  polynomial  p{z)  can  be  on  the  set 
A(y4),  we  ask  how  small  the  norm  ||p(/l)||  can  be.  Matrix  approximation  questions 
are  implicit  in  much  of  the  literature  of  matrix  iterations;  we  certainly  do  not  claim 
to  be  the  first  to  consider  them.  However,  they  have  received  no  discussion  in  print 
that  we  are  aware  of.  We  believe  it  is  important  to  investigate  these  problems  if 
one's  goal  is  an  understanding  of  matrix  iterations  that  does  not  depend  upon 
hidden  assumptions  of  near-normality. 

We  shall  concentrate  on  two  algorithms  for  nonsymmetric  matrix  problems: 
GMRES,  which  solves  systems  of  equations  Ax  =  b,  and  Arnoldi,  which  computes 
eigenvalues  of  A.  Our  matrix  approximation  analogues  of  these  processes  are  called 
the  "ideal  GMRES"  and  "ideal  .'Xrnoldi'"  problems.  Mathematically,  the  new  result 
presented  here  is  a  proof  of  the  existence  and  uniqueness  of  ideal  GMRES  and 
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Arnoldi  approximants.  (Existence  is  trivial,  but  uniqueness  is  surprisingly  tricky.) 
In  the  final  section  we  propose  four  questions,  as  yet  unresolved,  whose  solution 
might  further  advance  our  understanding  of  matrix  iterations. 

2.  GMRES  and  Arnoldi 

Throughout  this  paper  A'  and  n  <  N  are  integers,  A  is  an  A^  x  A^  matrix,*  b  is 
an  A^-vector,  ||  •  |1  is  the  2-norm,  and 

P^  =  {polynomials  of  degree  <  n  with  p(0)  =  1}, 
P"  =  {monic  polynomials  of  degree  n}. 

The  difference  between  P„  and  P"  is  that  P„  is  normalized  at  r  =  0  and  P"  at 
z  =  oo. 

GMRES  [4,19]  is  an  algorithm  that  solves  the  following  approximation  problem'^ 
successively  for  n  =  1 , 2, 3 . . . : 


GMRES  approximation  problem.    Find  p^  6  P„  such  that 

II  p,  (.4)  6 II  =  minimum.  (1' 


An  equivalent  statement  is 

b^  (Ab,A-b,....A''b),  ■  (1') 

where  "y  ~  V  denotes  the  problem  of  finding  the  best  approximation  with  respect 
to  II  •  II  to  the  point  y  in  the  space  V''.  This  characterization  of  GMRES  is  well 
known.  To  explain  it  one  notes  that  GMRES  finds  a  vector  x„  in  the  Krylov 
subspace  AC„  =  (6,  Ab,  ...  ,  A"~^b)  such  that  the  residual  r„  =  fe—  ,4x„  has  minimal 
norm  over  all  x  6  A^„.  This  vector  x„  can  be  represented  in  the  form  x„  =  (}{^)b 
for  some  polynomial  q{z)  of  degree  n  — 1,  and  (1)  comes  upon  writing  r„  =p^{A)b 
^^'[thp,{z)  =  l-zq{z)eP„. 

The  Arnoldi  iteration  [1,4]  is  an  algorithm  that  solves  the  analogous  problem 
involving  P"  instead  of  P„: 


*  Nothing  essential  changes  if  we  take  N  =  oc  and  let  y4  be  a  bounded  operator. 

T  We  have  assumed  that  the  initial  guess  for  the  iteration  is  iq  =  0.  Arbitrary  initial  guesses  can  be  handled 
by  an  easy  modification 


Arnoldi  approximation  problem.    Find  p*  €  P^  such  thai 

\\p*{A)b\\  —  nnnunum.  (2) 


Equivalently, 

>1"6«  (6, /16,...,/l"-^6).  (2') 

The  vector  6  is  no  longer  the  right-hand  side  of  a  system  of  equations,  but  an 
arbitrary  initial  vector.  This  characterization  is  also  known,  but  not  so  widely 
known.*  It  can  be  readily  proved  as  a  consequence  of  the  usual  formulation  of  the 
Arnoldi  process  in  terms  of  orthogonality.  The  .'Arnoldi  iteration  "finds  p*"  in  the 
sense  that  it  constructs  a  Hessenberg  matrix  //„  of  which  p*  is  the  characteristic 
polynomial. 

If  A  is  real  and  symmetric,  the  GMRES  and  Arnoldi  iterations  reduce  to  the 
conjugate  residual  (CR)  and  Lanczos  iterations,  respectively.  Everything  said  in 
this  paper  about  GMRES  applies  also  to  CR,  and  everything  said  about  Arnoldi 
applies  also  to  Lanczos. 

A  comparison  of  (1)  and  (2)  suggests  that  from  an  approximation  point  of 
view,  the  difference  between  the  GMRES  and  Arnoldi  algorithms  is  slight.  This 
analogy  is  rarely  brought  out  in  accounts  of  these  algorithms,  partly  for  historical 
reasons  and  partly  because  the  usual  applications  of  the  two  algorithms  are  differ- 
ent. Whereas  GMRES  is  applied  to  solve  systems  of  equations  Ax  =  6,  so  that  (1) 
comes  quickly  to  mind  as  a  description  of  it,  the  Arnoldi  iteration  is  traditionally 
thought  of  as  a  method  for  estimating  eigenvalues  of  A.  The  ".Arnoldi  eigenvalue 
estimates''^  at  step  n  are  the  eigenvalues  of  //„,  that  is.  the  roots  of  p*.  But  what 
the  Arnoldi  iteration  actually  does  is  solve  (2);  its  connection  with  eigenvalues  is 
indirect  and  approximate. 

The  formulations  (1)  and  (2)  provide  elegant  proofs  of  certain  well-known 
properties  of  the  GMRES  and  Arnoldi  iterations.  For  example,  one  sees  imme- 
diately from  (1)  and  (2)  that  both  of  these  iterations  are  essentially  invariant 
under  changes  of  scale  {A  — +  a  A,  q  €  C)  and  under  unitary  similarity  transforma- 
tions [A  — >  UAU*,  U*  =  U~^).   The  Arnoldi  iteration  is  also  translation-invariant 


*ln  the  symmetric  (Lanczos)  case  (2)  appears  as  Corollary  12-3-7  of  [17],  unfortunately  with  a  typograph- 
ical error. 

'The  roots  of  ;;'  are  Ritz  values  of  A  with  respect  to  tiie  Krylov  subspace  (2').  By  analogy,  the  roots  of 
p,  are  sometimes  called  '•pseudo-Ritz  values"  [4].  The  Ritz  values  lie  in  the  field  of  values  of  A,  and  the 
pseudo-Ritz  values  lie  in  the  inverse  of  the  field  of  values  of  A~   ;  see  [13]. 


{A—^A  +  al,  Q  €  C),  since  oo  is  translation-invariant,  but  GMRES  is  not,  since 
0  is  not.  In  these  statements  and  throughout  this  paper,  we  ignore  the  effects  of 
rounding  error. 

3.  Ideal  GMRES  and  Ideal  Arnoldi 

The  GMRES  and  Arnoldi  iterations  depend  on  the  starting  vector  6.  However, 
one  may  remove  b  from  the  discussion  and  pose  the  following  "ideal"  approximation 
problems: 


Ideal  GMRES  approximation  problem.    Find  g,  €  P„  such  that 

II  9»('4)  II  =  minimum,  (3) 


or  equivalent!}', 

7;^  (.4,  A-...., /I").  (3') 


Ideal  Arnoldi  approximation  problem.    Find  q*  G  -P"  such  that 

\\  q*{  A)  \\  =  minimum,  {■i] 


that  is, 

A"  «(/,/! 4"-').  (4') 

Whereas  (1)  and  (2)  are  vector  approximation  problems,  (3)  and  (4)  involve  ma- 
trices. Procedures  for  computing  these  polynomials,  either  actual  or  in  our  imagi- 
nations, might  be  called  ideal  GMRES  and  ideal  Arnoldi  algorithms.  Some  com- 
putations of  this  kind  are  discussed  in  §6. 

We  believe  that  studying  these  idealized  problems  is  a  fruitful  way  to  gain 
insight  into  the  properties  of  Krylov  subspace  iterations  in  linear  algebra.  Our 
reasoning  is  as  follows.  The  behavior  of  a  GMRES  or  .Arnoldi  iteration  is  deter- 
mined by  two  things:  A  and  6.  However,  though  the  special  properties  of  b  are 
occasionally  important,  more  often  the  features  that  one  cares  about  do  not  differ 
very  much  from  one  choice  of  6  to  another.  It  is  the  properties  of  A  that  usually 
decide  between  an  iteration  that  converges  in  10  steps  and  one  that  requires  100 
or  1000  (which  in  practice  means  it  is  time  to  look  for  a  better  preconditioner).  By 
passing  from  (l)-(2)  to  (3)-(4)  we  disentangle  this  matrix  essence  of  the  process 


from  the  distracting  effects  of  the  initial  vector,  and  end  up  with  a  pair  of  elegant 
mathematical  problems  in  the  bargain. 

The  solutions  to  (l)-(2)  and  (3)-(4)  are  related  by  the  following  bounds. 
The  proof  of  this  theorem  is  easy;  the  four  inequalities  follow  from  the  minimality 
properties  (1),  (3),  (2),  and  (4),  respectively. 


Theorem  1.   The  true  and  ideal  GMRES  polynomials  are  related  by 

^^^^<hAA)\\<\\pM)\l  (5) 

and  the  true  and  ideal  Arnoldi  polynomials  are  related  by 

\\p'(A)h\ 


\b\ 


<  \\q*{A)\\  <  \\p'{A)\\.  (6) 


These  two  pairs  of  bounds  are  identical  in  form,  but  from  the  point  of  view  of 
applications,  the  nature  of  the  relationship  between  (4)  and  (2)  is  quite  different 
from  that  between  (3)  and  (1).  The  purpose  of  GMRES  is  to  solve  (1).  not  (3). 
The  relevance  of  (3)  is  that  it  gives  an  upper  bound  on  how  slow  the  convergence 
may  be,  thanks  to  (5),  and  if  the  right-hand-side  b  is  "random  enough"  one  may 
expect  that  this  bound  will  be  close  to  sharp.  For  an  .Arnoldi  iteration  aimed  at 
estimating  eigenvalues,  the  logic  is  reversed.  It  is  our  opinion  that  the  essence  of 
the  process  by  which  an  .Arnoldi  iteration  locates  eigenvalues  is  the  solution  not 
of  (2)  but  of  (4).  The  iteration  solves  (2)  beca.se  that  is  what  is  computationally 
tractable,  but  the  implicit  hope  is  that  if  b  is  "random  enough,"  the  solution  to 
(2)  will  be  a  good  approximation  to  the  solution  to  (4).  See  [23]. 

The  ideal  Arnoldi  polynomial  q'  might  be  called  the  degree  n  Chebyshev 
polynomial  of  A,  in  analogy  to  the  notion  in  approximation  theory  of  a  Chebyshev 
polynomial  of  a  subset  of  the  complex  plane,  which  is  a  monic  polynomial  that 
achieves  minimal  sup-norm  on  that  set.*  Another  way  to  view  </*  is  as  a  pseudo- 
annihilating  polynomial  for  A.  i.e.,  a  monic  polynomial  that  maps  .4  to  a  matrix 
of  norm  e^O.  According  to  the  usual  dehnition,  an  annihilating  polynomial  is 
a  polynomial  that  annihilates  A  exactly.  This  is  a  fragile  concept,  however,  ill- 
posed  with  respect  to  perturbations  of  .4  and  with  little  quantitative  force.  Krylov 
subspace  iterations  in  numerical  linear  algebra  are  founded  on  the  observation  that 


*See  Chapter  16  of  [9].  This  analogy  becomes  an  identity  if  A  is  normal;  see  the  next  section. 


for  practical  purposes,  a  pseudo-annihilating  i)olynomial  with  parameter  e  =  10~'° 
or  10~*°  is  as  useful  as  an  exact  annihilating  polynomial  and  may  be  of  vastly 
lower  degree. 

4.  The  special  case  when  A  is  normal 

When  A  is  normal,  problems  (l)-(4)  reduce  to  standard  problems  of  approx- 
imation theory.  For  any  vector  b  we  have 

N 

where  {Xj}  and  {v^}  are  a  set  of  eigenvalues  and  corresponding  orthonormal  eigen- 
vectors of  y4,  and  therefore 


\\p(A)b\\ 


Thus  the  Arnold!  and  GMRES  problems  (l)-(2)  are  equivalent  to  weighted  least- 
squares  approximation  problems  in  the  complex  plane:  find  an  appropriately  nor- 
malized polynomial  p{z)  that  has  minimal  weighted  2-norm  on  A{A)  with  respect 
to  the  discrete  weight  function  {|u.'j|-}.  .As  for  the  ideal  Arnoldi  and  GMRES 
problems  (3)-(4),  the  identity 

||;>(.4)||  =     sup    |p(A)| 
xeMA) 

(for  normal  matrices  only)  shows  that  they  are  are  equivalent  to  Chebyshev  ap- 
proximation problems  in  the  complex  plane:  find  a  polynomial  that  has  minimal 
supremum  norm  on  A(A).  In  particular,  the  ideal  Arnoldi  polynomial  q^  is  exactly 
the  same  as  the  Chebyshev  polynomial  for  the  set  A(.4),  mentioned  in  the  last 
section. 

Both  weighted  least-squares  and  Chebyshev  approximation  problems  in  the 
complex  plane  are  well  understood  and  discussed  in  many  books.  Existence  and 
uniqueness  of  best  approximations  are  easily  proved  (we  defer  a  precise  statement 
to  the  next  section).  In  the  Chebyshev  case  the  computation  of  best  approxima- 
tions can  be  carried  out  by  various  methods  such  as  linear  programming,  variations 
of  the  Remes  algorithm,  Lawson's  algorithm,  or  other  ideas;  see  [20]  and  the  ref- 
erences therein.  In  the  least-squares  case  the  computation  of  best  approximants  is 


a  matter  of  routine  linear  algebra.  One  can  use  a  QR  decomposition,  for  example, 
and  that  is  exactly  what  GMRES  does. 

There  are  two  reasons  to  pay  special  attention  to  the  case  in  which  A  is  normal. 
First,  some  matrices  are  normal  or  close  enough  to  normal  that  the  results  one 
obtains  are  sometimes  applicable  in  practice.  In  particular,  the  conjugate  residual 
and  Lanczos  iterations  fall  in  this  category  since  symmetric  matrices  are  normal. 
Second,  most  people's  intuitions  about  the  behavior  of  matrices  are  based  on  the 
normal  case.  By  studying  how  the  normal  case  differs  from  the  general  one  we 
obtain  a  veduable  check  on  our  intuitions. 

5.  Existence  and  uniqueness 

We  return  now  to  problems  (l)-(4)  for  matrices  A  that  are  arbitrary,  i.e., 
not  necessarily  normal.  The  most  fundamental  questions  to  be  asked  about  (1)- 
(4)  are  those  of  existence  and  uniqueness.  For  (1)  and  (2)  the  answers  to  both 
are  straightforward  and  well  known.  For  (3)  and  (4),  existence  is  straightforward 
but  uniqueness  is  not.  So  far  as  we  are  aware,  though  this  seems  surprising,  the 
uniqueness  of  the  solutions  to  (3)  and  (4)  is  a  new  result. 


Theorem  2.  The  optima!  polynomials  p,,  p* ,  q^.  q*  all  exist.  Provided  that  the 
minima  in  (l)-(4)  are  nonzero,  and  provided  in  the  cases  of  GMRES  and  ideal 
GMRES  that  A  is  nonsingular.  they  are  unique. 


Proof.  Consider  the  formulations  (l')-(4').  In  each  case  we  have  a  problem  of 
the  form  y  ^  V,  where  V  is  a  finite-dimensional  subspace  of  a  vector  space  \V  and 
y  6  W.  (For  (1)  and  (2),  V  and  W  are  spaces  of  A'-vectors,  whereas  for  (3)  and  (4) 
they  are  spaces  of  A'^  x  A'  matrices;  generically  we  can  speak  of  '"vectors"  in  either 
case.)  Existence  of  a  closest  point  r  €  V  to  y  follows  by  a  standard  compactness 
argument  that  can  be  found  in  any  book  on  approximation  theory.  See  for  example 
p.  20  of  [2]  or  p.  17  of  [12]. 

The  question  of  uniqueness  of  the  polynomials  p, .  p* .  </.,  q*  can  be  divided 
into  two  parts: 

(a)  Is  the  closest  point  r  6  V  to  y  unique? 

(b)  Does  it  have  a  unique  representation  as  a  linear  combination  of  the  n  vectors 
indicated  in  (l'H4')? 

Part  (b)  can  be  dispatched  as  follows.  What  we  have  to  show  is  that  the  n  vectors 
in  question  are  linearly  independent.     Consider  first  the  ideal  .^rnoldi  problem 


(4'),  and  suppose  to  the  contrary  that  I ,A,..  .,A"~^  are  linearly  dependent.  Then 
by  multiplying  by  a  power  of  A  if  necessary  we  can  find  a  linear  combination  of 
them  that  is  zero  and  in  which  the  coefficient  of  A"~^  is  nonzero.  Thus  A"~^  € 
(7,^,  ...,>l"-2),  whichimplies^"€(/l, /l^---,^""^)  and  therefore  ||9*(/l)||  =  0, 
contradicting  the  assumption  that  the  minimum  in  (4)  is  nonzero.  An  analogous 
argument  applies  to  (2').  For  the  GMRES  problem  (3'),  suppose  A,A^,...,A^  are 
linearly  dependent,  which  by  a  similar  argument  implies  A^  €  {A,  A",  ...  ,A"~^) 
and  thus  again  ||9*(i4)||  =  0.  If  the  constant  term  of  q*  is  nonzero,  then  dividing 
by  that  constant  term  yields  a  properly  normalized  GMRES  polynomial  q,  with 
||g,(>l)||  =  0.  On  the  other  hand  if  the  constant  term  is  zero,  then  since  A  is 
nonsingular  we  can  multiply  by  A~^  one  or  more  times  until  it  becomes  nonzero. 
An  analogous  argument  applies  to  (!'). 

This  brings  us  to  part  (a)  of  the  proof  of  uniqueness.  For  problems  (!')  and 
(2')  the  uniqueness  of  v  follows  by  another  standard  result  in  approximation  theory, 
since  the  vector  norm  ||-||  is  strictly  convex.  See,  for  example,  p.  23  of  [2]  or  p.  17  of 
[12].  The  matrix  norm  ||-||,  however,  is  not  strictly  convex,  so  a  different  argument 
is  required  for  (3')  and  (4').  One  must  wet  one's  hands  in  the  linear  algebra. 

Consider  first  the  ideal  Arnoldi  problem  (4).  Suppose  that  q^  and  qo  are  two 
distinct  solutions  to  (4),  and  let  the  minimal  norm  they  attain  be 

\\q,{A)\\  =  \\q2{A)\\^C. 

If  we  define  9(z)  =  ^(9i(c)  +  72(~)),  then  ||9(.4)||  <  C,  so  we  must  have  ||g(>l)||  =  C 
since  gj  and  q^  are  minimal.  Let  w-i,...,Wj  be  a  set  of  maximal  right  singular 
vectors  for  q{A).  i.e.,  a  set  of  orthonormal  vectors  with 

\\q{A)Wj\\  =  C,  1<J<J, 

with  J  as  large  as  possible.  For  each  w.  we  must  have 

WqMHW  =  \\q2{A)Wj\\  =  C 

and 

qi(A)Wj  =  q2(A)Wj, 

for  otherwise,  by  the  strict  convexity  of  the  vector  norm  ||-||,  we  would  have 
\\q{A)Wj\\<C.  Thus 

{q,-q,)(A)Wj=0.  1<J<J. 
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Now  since  (92~92)('^)  ^^  "°^  identically  zero,  we  can  multiply  it  by  a  scalar  and  a 
suitable  power  of  z  to  obtain  a  monic  polynomial  Af/  €  P"  such  that 

A<?(A)u;_,  =  0,  l<i<J. 

For  t  €  (0,1),  consider  now  the  polynomial  q^  6  P"  defined  by  the  convex  linear 
combination 

If  Wjj^^^. .  .,Wf^  denote  the  remainder  of  a  set  of  A'^  singular  vectors  of  q{A),  with 
corresponding  singular  values  C  >  ctj^.]  >  •  ■  •  >  cr^^r  >  0,  then  we  have 

r(i-e)c  (i<j<J), 

^"   -    l(l-£)a,^,  +  e||Ar;(.4)||     (J+1<;<A'). 

The  first  row  is  <  C  for  arbitrary  e,  and  the  second  row  is  <  C  for  sufficiently 
small  e,  since  cr j^\  <  C .  Since  the  singular  vectors  lUj , . . . ,  tt'^-  form  an  orthonormal 
basis  for  R  ,  this  implies  that  ||9((i4)||  <  C  for  sufficiently  small  e,  contradicting 
the  assumption  that  q^  and  qo  are  minimal. 

For  the  ideal  GMRES  problem  (3)  the  argument  is  the  same  except  that  from 
92~9l  ^^  need  to  construct  A9  €  P„  rather  than  A9  G  F".  If  the  constant  term 
of  qo  —  qi  is  nonzero,  we  do  this  by  dividing  by  that  constant  term.  If  it  is  zero, 
we  make  use  of  the  assumption  that  A  is  nonsingular  and  multiply  by  a  suitable 
power  of  z~^ .    | 


6.  Computations 

If  A  is  normal,  the  ideal  Arnoldi  and  GMRES  polynomials  q'  and  q,  are  simply 
Chebyshev  polynomials  for  the  set  A(.4).  as  noted  in  §4.  and  can  be  computed  by 
various  algorithms.  If  A  i.s  not  normal,  however,  we  know  of  no  simple  algorithm 
that  is  guaranteed  to  compute  q*  and  q,.  For  simplicity,  from  now  on  we  shall 
consider  the  ideal  GMRES  polynomial  q,;  our  remarks  carry  over  straightforwardly 
to  the  ideal  Arnoldi  problem. 

We  have  found  that  in  many  cases,  q,  can  be  computed  by  using  an  opti- 
mization code  to  determine  an  initial  vector  6,  with  ||6||  =  1,  for  which  ||p*(>l)6|| 
is  maximal  at  the  prescribed  step  n.  From  Theorem  1  we  have 

||p.(-4)6||<  \\qAA)\\  <  \\pAA)\\  (7) 
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Figure  1.  Convergence  curves  for  the  16  x  16  Lenferink-Spijker  matrix. 
The  upper  curve  corresponds  to  ideal  GMRES  and  the  lower  curves  to 
standard  GMRES  with  five  different  random  initial  vectors  b. 

for  any  b  with  ||61|  =  1.  If  a  choice  of  b  can  be  found  for  which  ||p^.(/l)  6||  =  ||p,(y4)||, 
it  follows  that  p^  =  9».  It  is  not  known  whether  such  a  b  always  exists,  but  we 
conjecture  that  it  does  (see  the  first  question  of  the  next  section).  Maximizing  the 
left-hand  side  of  (7)  seems  to  be  easier  in  practice  than  minimizing  the  right-hand 
side,  presumably  because  the  latter  problem  is  non-smooth.  We  have  carried  out 
our  computations  using  the  Matlab  optimization  routine  fmhiu  [14]  coupled  with  a 
GMRES  subprogram  to  compute  ||p«(>l)6||  for  a  given  vector  b.  Although  several 
attempts  with  different  initial  guesses  b  are  often  required  for  the  optimization 
code  to  succeed,  it  usually  does  so  eventually. 

As  an  alternative  to  the  use  of  a  general  optimization  program  such  as  fminu, 
we  have  found  that  one  can  also  maximize  the  left-hand  side  of  (7)  by  means  of 
an  algorithm  modeled  on  Lawson's  algorithm,  which  is  a  method  of  iteratively 
reweighted  least-squares  best  known  in  the  context  of  V  approximation  [18].  We 
have  obtained  good  results  this  way  in  many  cases.  However,  we  have  not  yet 
found  a  method  of  this  kind  that  converges  consistently,  and  will  therefore  not 
give  details  here. 

Figure  1  illustrates  the  behavior  of  the  ideal  vs.  true  GMRES  polynomials  by 
a  very  simple  example — a  matrix  of  Lenferink  and  Spijker  [11,22].  This  is  a  non- 
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normal  tridiagonal  matrix  of  the  form  tridiag{(i  +  \)~^ ,  —  3  — 2z,  2  +  1),  i  =  I,. .  .,N. 
For  the  Lenferink-Spijker  matrix  of  order  A'  =  16,  we  computed  the  ideal  GMRES 
polynomials  g»  of  degree  1  through  15  as  well  as  the  true  GMRES  polynomials  p, 
for  five  different  random  initial  vectors.  The  thick  curve  in  the  figure  represents 
the  norms  ||g^(/l)||  as  a  function  of  n,  and  the  thinner  curves  represent  ||p,(y4)6|| 
for  the  various  vectors  b.  The  GMRES  curves  lie  below  the  ideal  GMRES  curve, 
as  they  must,  but  exhibit  qualitatively  the  same  shape.  Our  experiments  have 
not  been  sufficiently  extensive  to  draw  conclusions  about  how  GMRES  and  ideal 
GMRES  convergence  curves  compare  in  general. 

It  is  interesting  to  note  that  while  the  norm  of  the  ideal  GMRES  polynomial 
for  this  problem  decreases  strictly  monotonically,  this  does  not  always  happen.  For 
many  problems,  ||g,(/l)||  remains  exactly  equal  to  1  for  a  number  of  steps  before 
it  begins  to  decrease.  The  case  7(  =  1  of  this  phenomenon  is  fully  understood:  one 
can  show  that  ||9«{^)||  <  1  at  step  1  if  and  only  if  the  field  of  values  of  A  lies  in  an 
open  half-plane  with  respect  to  the  origin  in  C  (see  [3]  and  §6  of  [15]).  For  some 
problems,  ||9»(v4)||  =  1  for  steps  1  through  N  —  1.  This  happens  frequently  with 
random  matrices,  for  example.  For  such  problems  there  is  a  right-hand  side  vector 
b  for  which  the  GMRES  algorithm  makes  no  progress  whatsoever  until  step  A'. 

A  difference  we  have  observed  between  the  ideal  GMRES  polynomials  for 
normal  and  non-normal  matrices  is  the  following.  If  A  is  normal,  then  (j*iA)  must 
have  at  least  n  -f  1  equal  maximal  singular  values.  This  follows  from  the  fact  that 
the  degree  n  Chebyshev  polynomial  for  a  set  in  the  complex  plane  always  takes  on 
its  maximum  absolute  value  in  at  least  n  +  1  distinct  points.  In  contrcist,  when  A  is 
not  normal,  our  experiments  indicate  that  after  the  initial  phase  with  ||g»(v4)l|  =  1 
just  mentioned.  q^{A)  usually  has  only  one  maximal  singular  value. 

7.  Open  questions 

Our  work  on  ideal  GMRES  and  Arnoldi  approximations  has  raised  more  ques- 
tions than  it  has  answered.  We  shall  close  with  a  list  of  four  questions  that  we 
consider  particularly  interesting.  In  each  of  the  following  A  is  a  matrix,  6  is  a  vec- 
tor normalized  by  ||6||  =  1,  and  the  convergence  curve  is  the  curve  of  ||p,(/l)fc||  or 
||9«(>l)||  as  a  function  of  the  step  number  n.  The  questions  are  posed  for  GMRES, 
but  they  all  have  Arnoldi  analogues. 

1.  Is  the  envelope  attained?  Theorem  1  asserts  that  the  GMRES  convergence 
curve  lies  below  the  ideal  GMRES  convergence  curve,  as  illustrated  in  Figure  1. 
Given  n,  does  there  exist  an  initial  vector  h  such  that  these  two  curves  intersect 


at  step  n,  i.e.,  such  that  ||p,(A)6||  =  ||9»(A)||?  The  answer  is  known  to  be  yes 
for  symmetric  matrices  [5]  and  more  generally  for  normal  matrices  [7,10],  and  for 
arbitrary  matrices  at  step  n  =  1  [7,10].  It  is  also  yes  in  the  "generic"  non-normal 
case  in  which  the  maximal  singular  value  of  q^{A)  is  simple.  Whether  it  is  yes 
in  all  cases  is  not  known.  If  it  is,  then  the  ideal  GMRES  convergence  curve  can 
be  described  precisely  as  the  upper  envelope  of  the  GMRES  convergence  curves 
corresponding  to  all  initial  vectors  b. 

2.  What  convergence  curves  are  possible?  At  step  n,  \\q*{A)\\  must  be  at  least 
as  small  as  niini<jt<„_i  ||9ji.(yl)||||9„_fc(/l)||,  where  q^  denotes  the  ideal  GMRES 
polynomial  for  A  at  step  k.  Geometrically  this  means  that  the  ideal  GMRES 
convergence  curve  is  convex  when  plotted  on  a  logarithmic  scale.  Are  all  conver- 
gence curves  that  satisfy  this  convexity  constraint  possible?  If  not,  how  can  one 
characterize  those  convergence  curves  that  are  possible? 

3.  Can  any  matrix  be  simulated  by  a  normal  matrix?  Short  of  a  full  character- 
ization of  convergence  curves,  one  may  naturally  ask  if  the  possibilities  for  normal 
matrices  are  more  restricted  than  for  non-normal  matrices.  In  other  words,  are 
there  convergence  curves  that  can  only  be  generated  bj'  a  non-normal  matrix? 
For  standard  GMRES  the  answer  has  recently  been  proved  to  be  no;  any  sequence 
||p«(yl)6||  as  a  function  of  n  can  be  duplicated  by  another  sequence  ||p,(j4)6||  where 
A  is  normal — in  fact,  unitary  [6].  For  ideal  GMRES,  the  answer  is  unknown. 

4-  Do  the  pseudospectra  determine  the  convergence  curve?  Our  final  question 
returns  to  the  topic  of  whether  matrix  iterations  are  equivalent  to  approximation 
problems  in  the  complex  plane.  No  such  equivalence  has  yet  been  found  in  the 
case  where  A  is  not  normal.  In  particular,  the  spectrum  of  A  does  not  determine 
the  ideal  GMRES  convergence  curve,  and  neither  do  the  Jordan  canonical  form  or 
the  field  of  values.  However,  what  about  the  e-pseudospectra  of  A,  that  is,  the  sets 
A,{A)^{zeC:  \\{zI-A)-'^\\>t-^}  [15,21,22]?  Can  two  matrices  have  identical 
e-pseudospectra  for  all  e>0  but  distinct  ideal  GMRES  convergence  curves?  This 
is  a  special  case  of  the  following  more  general  question.  If  Af.(A)  —  Af.(B)  for  all 
c  >  0,  where  A  and  B  are  two  matrices  (not  necessarily  of  the  same  dimension), 
does  this  imply  ||/(y4)||  =  ||/(J5)||  for  all  analytic  functions  /?  (Without  loss  of 
generality  we  can  be  take  /  to  be  a  polynomial.)  In  other  words,  if  "behavior" 
is  measured  by  norms  of  functions  of  matrices,  do  the  pseudospectra  of  a  matrix 
determine  its  behavior? 
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